Topic Model Diagnostics:Assessing Domain Relevance via Topical Alignment0.1in(Supplementary Materials)

نویسندگان

  • Jason Chuang
  • Sonal Gupta
  • Christopher D. Manning
چکیده

We focused on InfoVis research due to relevance, scope and familiarity. Analysis of academic publications is one of the common real-world uses of topic modeling (Griffiths & Steyvers, 2004). Our familiarity with the InfoVis community allowed us to contact experts capable of exhaustively enumerating its research areas. InfoVis has a single primary conference, simplifying the construction and analysis of its publications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment

The use of topic models to analyze domainspecific texts often requires manual validation of the latent topics to ensure that they are meaningful. We introduce a framework to support such a large-scale assessment of topical relevance. We measure the correspondence between a set of latent topics and a set of reference concepts to quantify four types of topical misalignment: junk, fused, missing, ...

متن کامل

Topic Cropping: Leveraging Latent Topics for the Analysis of Small Corpora

Topic modeling has gained a lot of popularity as a means for identifying and describing the topical structure of textual documents and whole corpora. There are, however, many document collections such as qualitative studies in the digital humanities that cannot easily benefit from this technology. The limited size of those corpora leads to poor quality topic models. Higher quality topic models ...

متن کامل

Improved Query Topic Models via Pseudo-Relevant Pólya Document Models

Query-expansion via pseudo-relevance feedback is a popular method of overcoming the problem of vocabulary mismatch and of increasing average retrieval effectiveness. In this paper, we develop a new method that estimates a query topic model from a set of pseudo-relevant documents using a new language modelling framework. We assume that documents are generated via a mixture of multivariate Pólya ...

متن کامل

Traffic Scene Analysis using Hierarchical Sparse Topical Coding

Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...

متن کامل

A toponym-based dual vector for topical relevance calculation in focused spatial crawling

Focused crawler is a Web crawler that tries to download only pages that are relevant to a given topic of interest (Siemiński 2009, Almpanidis 2011). That is to say, it is necessary for focused crawler to calculate relevance between pages and specific topic (Rungsawang, 2005). Recently, the specific topic involving spatial information especially toponyms such as the topic about the Diaoyu Island...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013